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ABSTRACT 

In markets for online advertising, some advertisers pay only 
when users respond to ads. So publishers estimate ad re¬ 
sponse rates and multiply by advertiser bids to estimate ex¬ 
pected revenue for showing ads. Since these estimates may 
be inaccurate, the publisher risks not selecting the ad for 
each ad call that would maximize revenue. The variance of 
revenue can be decomposed into two components - variance 
due to ‘uncertainty’ because the true response rate is un¬ 
known, and variance due to ‘randomness’ because realized 
response statistics fluctuate around the true response rate. 
Over a sequence of many ad calls, the variance due to ran¬ 
domness nearly vanishes due to the law of large numbers. 
However, the variance due to uncertainty doesn’t diminish. 

We introduce a technique for ad selection that augments 
existing estimation and explore-exploit methods. The tech¬ 
nique uses methods from portfolio optimization to produce a 
distribution over ads rather than selecting the single ad that 
maximizes estimated expected revenue. Over a sequence of 
similar ad calls, ads are selected according to the distribu¬ 
tion. This approach decreases the effects of uncertainty and 
increases revenue. 

Keywords 

online advertising, portfolio allocation, uncertainty, random¬ 
ness 

1. INTRODUCTION 

In recent years, online advertising has emerged as a hugely 
profitable industry for publishers and advertisers alike on 
the Internet. Today’s online advertiser faces two choices for 
placing their ads - display advertising and search advertis¬ 
ing. Display advertising is most similar to traditional adver¬ 
tising, in that ads are placed in popular websites alongside 
the published content. An example would be an airline ad 
on the travel section of a news website. Search advertising, 


on the other hand, involves ads being shown alongside search 
results on certain keywords, on search websites. An example 
would be an airline ad alongside the list of search results for 
the keyword ‘travel’. Online publishers (in both display and 
search advertising) want to maximize the revenue generated 
from selling real estate (‘impressions’ or ‘ad calls’) on their 
websites, to advertisers. 


In the traditional model, advertisers pay publishers just for 
showing their ads. This is the CPM (cost per mille) model, 
where advertisers incur a cost per thousand impressions for 
which their ads are selected. In this model, the entire adver¬ 
tising risk is borne by the advertisers, because the publishers 
are guaranteed revenue for showing ads. Alternative models 
have since emerged, that split the risk more evenly between 
the publisher and the advertiser. For example, in the CPC 
(cost per click) model, an advertiser pays only when a user 
clicks on their ad, causing their landing page to be displayed 
in the user’s web browser. Clearly, the risk is split - a low 
click volume is bad for the publisher, but minimizes the ad¬ 
vertiser’s costs, whereas a high click volume with poor con¬ 
version is great for the publisher, but bad for the advertiser 
due to low return on investment. The CPA (cost per action) 
model is the other extreme, where the entire risk is borne 
by the publisher - an advertiser typically pays only when a 
user completes an action that guarantees a conversion. Ex¬ 
amples include users navigating to a specified page on the 
advertiser’s web site, filling out a web form that submits con¬ 
tact information to the advertiser (called lead generation), 
and completing an online purchase. For more about online 
advertising markets, refer to Varian |49| |50| , Edelman et al. 
[17|, and Lahaie and Pennock [35|. 


Advertisers submit bids for their ads to the publisher, in¬ 
dicating how much they are willing to pay. A publisher 
usually partitions available ad calls into different segments, 
called markets, and select ads to be shown for each mar¬ 
ket. When a publisher is faced with several ads to choose 
from, the revenue optimizing choice is most obvious if all 
ads are CPM ads - there is a wide range of literature that 
deals with optimal auctions, refer to Riley and Samuelson 
[44|, Myerson 41 and the textbook by Krishna 34 ; most 


notable among these is the generalized second price auction, 
first analyzed by Edelman et al. and Varian [^. If 
CPC and/or CPA ads are involved, then the function that is 
optimized by these auction mechanisms is the expected rev¬ 
enue, which is the advertiser payment times user response 




rate (commonly referred to as clickthrough rate (CTR) in 
the context of CPC ads). Since the response rates of the 
ads may not be known accurately, the expected revenue has 
to be estimated. We note that the mechanism used by the 
publisher to select ads affects the incentives of the compet¬ 
ing advertisers, which in turn affects their bidding behavior. 
While we do not consider these effects in this paper, in [37| , 
Li et al. discuss a pricing mechanism for portfolio allocation, 
taking this into account. 


Selecting a single ad that would result in the maximum es¬ 
timated expected revenue and showing it on all ad calls in a 
market is risky for the publisher, because the actual revenue 
may be less than the estimated expected revenue. To our 
knowledge, most existing techniques for reducing this risk 
use samples from historical performance data for the ads to 
obtain improved response rate estimates, see Richardson et 
al. [^, Agarwal et al. and Graepel et al. Even 

though such methods hold much merit, there is an inher¬ 
ent difficulty involved in ‘learning’ the response rates in this 
manner, especially when these response rates are very small, 
which is usually the case, see 


22 


In this paper, we outline a new approach for selecting ads 
that can augment existing techniques that involve intelligent 
learning of the response rates, to provide better risk man¬ 
agement for the publisher. This approach involves selecting 
a portfolio of ads to share the ad calls in the market, so as 
to reduce the variance of revenue. We use techniques from 
portfolio optimization for this purpose. Portfolio optimiza¬ 
tion itself is not a new idea; it has existed in the world of 
finance for decades. For more on portfolio allocation tech¬ 
niques, refer to the text by Fabozzi [18] , or the papers by 
Markovitz |40|, Lintner |39|, Sharpe [45|, and Tobin 48 . 


Let m be the number of ad calls and let n be the number 
of ads in a market. An allocation vector k = (fci,..., 
specifies the number of ad calls to allocate to each ad. The 
goal is to select an optimal allocation vector k* that medi¬ 
ates a tradeoff between maximizing expected revenue and 
minimizing variance of revenue. 

Assume each ad i in the market is generated by a distribu¬ 
tion over possible ads, each of which has a response rate. Let 
Si be the random variable that denotes the response rate of 
ad i. Let Ri be the distribution of Si. Let Xi{Si) be the 
random variable that denotes the revenue from showing ad 
i on an ad call. For example, if an advertiser pays his bid b 
only when the ad elicits a response, then 


X,{Si) 


b with probability Si 
0 with probability 1 — Si 


( 1 ) 


Define random variables Xhi{Si), for h, in {1,..., m} and i in 
{1,..., n}, to be the revenue Xi{Si) if ad call h is allocated 
to ad i. (Random variables Xhi{Si) are independent copies 
of Xi{Si).) Response rate Si is drawn once (according to 
Ri) and determines a distribution for revenue for all copies 
oi Xi{Si)-. Xi_i, ■ ■ ■ ,Xmi. But Ahi(5'i) is redrawn i.i.d. ac¬ 
cording to that distribution for each ad call h. Think of it as 
drawing a coin from a bag of coins to determine the response 
probability Si for ad i, then tossing that same coin once for 
each ad call to determine the values Xu,..., Xmi- In terms 
of this notation, ‘uncertainty’ is tied to the distribution Ri 
that generates the response rates, whereas ‘randomness’ is 
tied to the distribution Xi{Si) that determines the revenue 
for a given response rate. 


The variance in revenue is due to two factors which we refer 
to in this paper, as ‘undertainty’ and ‘randomness’. Un¬ 
certainty refers to the fact the true response rates are not 
known and are estimated. Randomness refers to the fact 
that the response rate, by definition, is still an ‘average’ - 
it represents the probability of eliciting a response, and the 
realized response statistics will fluctuate around it. The vari¬ 
ance due to randomness diminishes according to the law of 
large numbers, as the number of allocated ad calls increases 
(which happens over a large time period). The same cannot 
be said for the variance due to uncertainty. Portfolio allo¬ 
cation specifically targets the component of variance due to 
uncertainty by diversifying, i.e., spreading the ad calls over 
multiple ads. As a side-effect of reducing risk, our simula¬ 
tions show that there is potential for increasing the actual 
(realized) revenue, so even risk-neutral publishers benefit 
from this approach. 

Section establishes a formal model. Section describes 
how to optimize for a combination of estimated expecta¬ 
tion and variance of revenue in our formal model. Section 
analyzes the components of variance due to uncertainty 
and randomness, and the impact of diversification. Section 
uses simulations to illustrate that using portfolio alloca¬ 
tion has the potential to increase actual revenue. Section 
concludes with a discussion of directions for future work. 


The goal is to select an allocation k* to maximize expected 
return subject to controls on variance of returns. The ex¬ 
pectation and variance are over S = (5i,..., 5„) and X = 
(All,..., Ai„,... ,Xmi, ■.. ,Xmn). However, S and X are 
unknown at the time of portfolio allocation, so their statis¬ 
tics must be estimated. 

In the sections that follow, expectations, variances, and co- 
variances are over the distributions of the random variables 
in subscripts. For example, Es is expectation over the dis¬ 
tribution of S. Similarly, Uars.x is variance over the joint 
distribution of (S, X). 

3. ESTIMATED OPTIMAL ALLOCATION 

Let k be an allocation. Assume, without loss of generality, 
that the first fci ad calls are allocated to ad 1, the next k 2 are 
allocated to ad 2, and so on. Then the revenue for allocation 
k is 


n ki + ... + ki 

r(k,S,X)=^ ^ X^S.). (2) 

i=l h.=fci + .. . + A! j_ 1 + 1 

So the allocation optimization problem (AOP) is: 


max i5s,xr(k, S, X) 

k 


2. FORMAL MODEL 


( 3 ) 




subject to 

Fars,xr(k,S,X) <d, (4) 

where d is a specified bound on variance, and 

n 

\/i ■. ki> 0, and ki = m. (5) 

i=l 

Since expectations are linear, the expected revenue is 


£s,xr(k, S, X) = ^ kiEs^,x,X,{Si) (6) 

i = l 

The variance of revenue is: 


l/ars,xr(k,S,X) (7) 

n n 

-EE hkjCovs^s, [Ex,Xi{Si), Ex,Xj{Sj)] (8) 

i=l j = l 

n 

+ Y^kiEs,Varx,Xi{Si). (9) 

i=l 

(See Appendix [A| for the proof.) 

Define matrix A as 


aij = Covs„s, [Ex^XiiSi), Ex,Xj{Sj)], (10) 
and define vectors b and c: 


bi = EsAVarx,Xi{Si)] and a = Es,,x,X,{Si). (11) 

(Please excuse the abuse of notation: the symbol b repre¬ 
sents an advertiser’s bid elsewhere in this paper.) We now 
relax the constraint that ki can take only integral values. 
Then, the allocation optimization problem can be stated as 


maxc^k (12) 

k 

subject to 


k^Ak + b^k < d, (13) 


k > 0 and l^k = m. (14) 

Call this the matrix allocation problem (MAP). This is a 
convex quadratic programming problem, which can be solved 


by any of a number of available quadratic programming 
(QP) solvers, employing techniques such as Wolfe’s method 
[56| , which is covered by Franklin and other texts on 
linear and nonlinear programming. 

An alternative formulation uses a parameter q € [0, oo) to 
express how much to weight average returns versus variance. 
Solve the problem 


min k^Ak + b^k — gc^k (15) 

k 

subject to 


k > 0 and l^k = 1. (16) 

Call this the g-weighted matrix allocation problem (QMAP). 
This convex quadratic programming problem is in a form 
that is convenient for many QP solvers. (For general back¬ 
ground on allocation problems, refer to Franklin |20| or an¬ 
other text on mathematical programming and optimization.) 

4. ANALYSIS OF VARIANCE OF REVENUE 

The ad call allocation problems MAP and QMAP have the 
same form as the standard portfolio allocation problem in fi¬ 
nance. In finance, an investor seeks to allocate funds among 
investments, with the goals of achieving high expected re¬ 
turns and low variance of returns. In online advertising, a 
risk-averse publisher seeks to allocate ad calls among ads, 
with the goals of achieving high expected revenue and low 
variance of revenue. 

Like financial investors selecting a portfolio, using MAP and 
QMAP causes revenue-seeking, risk-averse publishers to: 

1. Allocate more ad calls to ads that have higher expected 
revenue. 

2. Allocate more ad calls to ads that have lower variance 
of revenue. 

3. Diversify: spread ad calls over multiple ads. 

4.1 Variance of revenue due to a single ad 

To begin, we focus on variance of returns due to a single ad 
i being allocated to ki ad calls in the market. To focus on 
individual allocations, let us assume for now that covariances 
with respect to all Si and Sj are zero: Vi yf j, aij = 0. (This 
occurs when ad response rates are estimated independently.) 

In a portfolio allocation, the portion of variance in revenue 
due to ad i is then given by 


Van = k^VarsAEx.XiiSi)] + hEsAVarx,X,iSi)]. (17) 

The first term is variance due to uncertainty. The second 
term is variance due to randomness. Variance due to un¬ 
certainty scales with the square of allocated ad calls ki. 


Variance due to randomness scales linearly. Uncertainty in¬ 
creases with allocation size because the actual response rate 
Si is drawn once and applies to all ad calls allocated to ad 
i, making their revenues correlated. In contrast, deviations 
in revenue due to differences between average and realized 
response rates are independent from ad call to ad call. 

We simplify the notation for the ensuing discussion as fol¬ 
lows. Let 

• k (instead of ki) count ad calls allocated to ad i, 

• 6 be the advertiser’s bid per response, 

• p be the ad’s (unknown) response rate, and 

• a he the standard deviation of error in estimating p. 


4.2 Effect of diversification on variance of rev¬ 
enue 

Suppose there are r ads available, with independent and 
identical sets of distributions Si and Xi{Si). Then all al¬ 
locations (fci, ... ,kr) of k ad calls have the same expected 
revenue. Independence implies 

Vi yf i : kikjCovs„s, lEx,Xi{Si),Ex,Xj{Sj)] = 0. (22) 

So the variance of revenue is 

r 

^ {k^VarsAEx,Xi{Si)] + hEsAVarx,XAS^)]) . (23) 

i = l 


Assuming an unbiased estimate of p, and assuming that tl 
advertiser pays his bid when a user responds to his ad, Equ; 
tion (17l: 


Let the uncertainty terms 


Vavi « k^b^a^ + kb^p{l — p), (18) 


Expected revenue per ad call is bp. Let c be the expected 
revenue required for the ad’s offer to be competitive. Then b 
needs to be at least Substitute into Approximation (18l: 


VarsAEx,Xi{Si)] = ... = VarsAEx^Xr-{Sr)] = a. (24) 


Let the randomness terms 


EsAVarx,Xi{Si)] = ... = EsAVarx,Xr{Sr)] = p. (25) 


Vavi 


k^c^o^ kc^(l—p) 

p 


(19) 


As ads are shown to users and response data is collected, un¬ 
certainty about response rates decreases. Suppose an ad has 
actual response rate p and obtains u responses from being 
allocated to v ad calls. Treating each ad call as a Bernoulli 
trial 1191 with success probability p, ^ is an unbiased esti- 
mator or p, with standard deviation 




( 20 ) 


Substitute cr = into Approximation (191 


Van 


k^c^il—p) kc^{l—p) 
vp p 


( 21 ) 


Since the first term accounts for uncertainty and the second 
for randomness, the ratio of uncertainty to randomness is 
about k : v. For example, if the number of learning ad 
calls is about nine times the current-session allocation k, 
then uncertainty accounts for about 10% of the variance 
in revenue. In general, an ad contributes more variance to 
revenue when its allocation k is larger, when the response 
rate p is smaller, and when fewer learning ad calls v have 
been used to estimate the response rate. 


If all ad calls are allocated to a single ad, then the variance 
is k^a + kp. If the ad calls are distributed uniformly over 
the ads, then the variance is ^k^a + kp. So diversification 
reduces variance caused by uncertainty. 

4.3 Covariance 

Uncertainty is completely correlated among ad calls allo¬ 
cated to the same ad, and it may also be correlated between 
different ads. In practice, uncertainty becomes correlated 
when ads “share” learning. For example, in a tree-based 
model for learning response rates, the ad calls and responses 
for each ad may influence response rate estimates for other 
ads in the same branch of the tree. As a result, differences 
between estimated and actual expected response rates are 
likely to become correlated for ads that are neighbors in the 
tree. 

Empirical data can be used to estimate covariance of ex¬ 
pected returns among ads. A model for the covariance can 
be based on whether ads share a branch or sub-branch in a 
tree model for response rate estimation (see Agarwal et al. 

, and Gelman and Hill [^), are in the 
same cluster in a cluster model (see Regelson and Fain [42|), 
have similar scores for factors in a factor-based model (see 
Agarwal and Chen Weinberger et al. [^, and Richard¬ 
son et al. |43| ), or use the same rules in a rule-based model 
(see Dembczynski et al. |15| ). The model for covariance can 
be trained by using converged response rate estimates for a 
population of experienced ads as proxies for actual response 
rates and observing how differences between early estimates 
and converged estimates are correlated for ads that “share” 
learning. 




Dudik et al. 
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5. SIMULATIONS 

This section uses simulations to show that using portfolio al¬ 
location to decrease estimated variance of revenue can cause 
an increase in the actual expected revenue. The simulations 
focus on markets for display advertising that have a mix of 
CPC and CPA ads. These results should also apply to mar¬ 
kets that have only CPA ads, with a variety of definitions 
of an action and hence a variety of response rates. This 
is the typical case for CPA-dominated markets in display 
advertising. 

In the simulations, ad response rates Si are independent of 
each other. For each ad i, let Ri be the actual prior distribu¬ 
tion for Si. Let Ri be the estimated prior distribution for Si. 
Such a distribution can be based on historical response rates 
for ads. Using Bayesian statistics, let Di be the estimated 
posterior distribution for Si. Statistics for this estimated 
posterior can be computed from the estimated prior Ri and 
the past performance of ad i. (Refer to Berger or another 
text on Bayesian statistics for methods to compute statistics 
for the estimated posterior.) 

Each simulation follows the steps: 

1. Generate ad response rates Si at random based on ac¬ 
tual priors Ri. 

2. For each ad, randomly generate a series of 100,000 
“learning” ad calls, with response rate Si, and record 
the number of responses. 

3. For each ad, compute an estimated posterior Di based 
on the ad’s estimated prior Ri and the number of re¬ 
sponses to learning ad calls. (For the method to do 
this, refer to a textbook on Bayesian statistics, such as 
Berger [^.) 

4. Use QMAP to allocate ad calls over the ads, based on 
statistics over the estimated posteriors Di. 

5. Record the actual expected revenue, kiSibi where 
ki is the number of ad calls allocated to ad i. This is 
the expected revenue achieved by the QMAP alloca¬ 
tion. 

6. For comparison, identify a“single winner” ad - an ad 
with maximum estimated revenue based on the esti¬ 
mated posteriors Di. (The revenue estimate is the bid 
times the mean of the estimated posterior.) This is 
the ad that would be selected to receive all ad calls 
without portfolio allocation, based on the available in¬ 
formation. Record its actual expected revenue. 

7. Also for comparison, record the ideal expected revenue, 
maxi Sibim, where 6; is the bid for ad i. This is the 
expected revenue if perfect knowledge of response rates 
could be used to select an ad with maximum actual 
expected revenue. 

Each simulation computes the QMAP allocation and collects 
results for all q values in 0 to 1500 with increments of 25, 
then the values 1750, 2000, 3000, 4000, 5000, 7500, 10,000, 
15,000, and 20,000. Each plot shows results averaged over 
10,000 simulations. 


Each simulation uses 20 ads: 10 CPC ads and 10 CPA 
ads. The CPC ads have $1 bids and actual priors Ri = 
A/'(0.001, 0.0001) - Gaussians with mean 0.001 and standard 
deviation 0.0001. The CPA ads have $10 bids and actual pri¬ 
ors Ri = A/'(0.0001, 0.00001), so that their revenues have the 
same distributions as the CPC ad revenues. 

We ran simulations using three possible estimated priors for 
response rates Ri: 

• Uniform - The prior is uniform over [0,1]. This simu¬ 
lates estimating response rates for each ad based on its 
own performance history, without using a prior based 
on histories of other ads. 

• Approximate - The prior is uniform over [/r — 4cr, fj. + 
4a], where /r and a are the mean and standard devi¬ 
ation of the actual priors Ri. This simulates using a 
prior based on histories of other similar ads in combi¬ 
nation with each ad’s own performance history. 

• Exact - The prior is the actual distribution used to 
generate response rates, that is, ^(0.001,0.0001) for 
CPC ads and ^(0.0001, 0.00001) for CPA ads. This 
simulates having exact knowledge of response rate dis¬ 
tributions, an ideal that does not occur in practice. 

Figure compares estimated expected revenue for portfo¬ 
lio allocations and single-winner allocations. As q increases, 
indicating a preference for revenue maximization over vari¬ 
ance minimization, estimated expected revenue for portfolio 
allocations approaches that for single winners, as expected. 
The estimated expected revenue is shown as a fraction of of 
ideal expected revenue - the expected revenue that would be 
realized using a single winner if response rates were known. 
With inexact priors, estimated expected revenues can ex¬ 
ceed actual ideal revenues, because there is a tendency to 
select ads with overestimated response rates. (For more on 
this “seller’s curse” effect, refer to Bax et al. [^.) 

Figures EE and [4| show actual, not estimated, expected 
revenues. Figures^ and illustrate that using portfolio 
allocation to control estimated variance can make actual ex¬ 
pected revenue higher than for selecting a single ad that 
maximizes estimated expected revenue. Controlling esti¬ 
mated variance counters the seller’s curse of selecting a single 
winner with overestimated response rate and hence overesti¬ 
mated expected revenue. As in Figurerevenues are shown 
as fractions of ideal expected revenue. Of course, actual ex¬ 
pected revenues do not reach ideal expected revenues, even 
with exact priors. 
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Figure 1: Estimated Expected Revenue 
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Figure 2: Actual Expected Revenue — Uniform Prior 
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Figure 3: Actual Expected Revenue — Approximate Prior 
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Figure 4: Actual Expected Revenue — Exact Prior 
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These results suggest two steps to improve exchanges for 
online advertising: 


• Use a Bayesian approach to estimate response proba¬ 
bilities, making the priors as accurate as possible us¬ 
ing the best available learning/estimation methods. 
(Use historical data to categorize ads, deduce the his¬ 
tograms or functional forms of priors, and fit any pa¬ 
rameters.) 

• Apply portfolio optimization, experimenting to set q 
to optimize for a combination of actual expected rev¬ 
enue and, if desired, actual variance of revenue. (For 
methods to adjust parameters using statistical exper¬ 
iments, refer to Box et al. |10| or another text on the 
design of experiments.) 

6. DISCUSSION 

This paper describes a technique to allocate inventory among 
buyers in online advertising, that mediates a tradeoff be¬ 
tween maximizing estimated expected revenue and minimiz¬ 
ing estimated variance of revenue (risk). The estimated vari¬ 
ance accounts for both uncertainty in estimated response 
rates and randomness around true response rates. Simula¬ 
tions show that this technique can increase actual expected 
revenue by preventing the exchange from selecting a single 
winner based on an overestimate of its payoff. 


One direction for future work is to extend the portfolio al¬ 
location technique to operate in concert with an explore- 
exploit method. (For more on explore-exploit methods, also 
called multi-armed bandit techniques, refer to Bianchi and 
Lugosi [^, Auer et al. [^, Langford et al. [^, Gittins 24 


|25| , and Gittins and Jones [^.) The technique in this paper 
can play the role of an exploitation method, offering the side 
benefit of performing some exploration by allocating ad calls 
to multiple ads. One potential approach to extend the port¬ 
folio allocation method to include systematic exploration is 
to add terms to the ad revenue expectations and variances 
to account for distributions of future revenues due to learn¬ 
ing more about response rates. For more on the value of 
learning, refer to Vermorel and Mohri and Li et al. . 


Exploration is investing ad calls to learn whether ads’ re¬ 
sponse rates warrant exploiting them by including them in 
future portfolio allocations. In a sense, exploration to deter¬ 
mine an ad’s revenue statistics is investing in the option to 
exploit the ad should it be determined to contribute value 
to the portfolio. In future research, it would be interesting 
to explore how this is similar to investing in a call option 
in a financial market. (For more on options, refer to Hull 
[30|.) In both cases, an upfront investment secures a right to 
decide whether to make another investment after more infor¬ 
mation is obtained. In a financial market, the information 
is revealed over time. In online advertising, the investment 
buys the information. In both cases, the downside risk is 
limited to the amount of the initial investment. In a finan¬ 
cial market, this occurs when an option is out of the money. 


Another direction for future work is to extend methods to 
accommodate uncertainty in portfolio analysis for financial 
markets to portfolio analysis for online advertising markets. 
(For some methods, refer to Jorion [33|, Jobson et al. [32|, 


and Vasicek [^.) It should be useful to apply James-Stein 
corrections or similar shrinkage methods to estimates of the 
means, variances, and covariances of revenue distributions 
for ads. (For more on shrinkage methods, refer to Bock [^, 
Brown [^, Stein [^, and James and Stein |31|.) 


The field of robust optimization focuses on optimization 
under uncertainty. Some robust optimization approaches 
address strict uncertainty (see Sniedovich [46| ), where the 
probabilities of possible outcomes are completely unknown. 
Others address less uncertain problems, where the distri¬ 
bution over possible outcomes is unknown but restricted to 
some set of distributions (see Ben-Tal and Nemirovski [^, 


Ben-Haim [^, and Chen et al. 
by examining the effect on risk 


In this paper, we began 
defined by French [^) 


of drawing a distribution over outcomes (corresponding to 
S) from a distribution over distributions (corresponding to 
R). Then we used simulations to explore the effect of having 
imperfect information about the distribution over distribu¬ 
tions. This introduces a form of uncertainty beyond risk, 
but not as severe as strict uncertainty. In the future, it 
would be interesting to apply the methods of robust opti¬ 
mization to ad allocation problems with uncertainty about 
the parameters (such as the number of ad calls available) 
as well as the payoffs. For more on robust optimization for 
portfolio problems, refer to Fabozzi et al. 18 and Goldfarb 
and Iyengar [27] . 


Finally, it would be interesting to analyze questions about 
bidding behavior under portfolio allocations. For example, 
could some advertisers benefit by switching price types from 
CPC to CPA or vice versa to adjust the variance of their 
ads’ revenues under portfolio allocation? Under a mixed al¬ 
location, there are multiple winners, so what is the second 
price? One answer to the latter question is VCG (Vickrey- 
Clarke-Groves 29 14 54 ^) pricing for portfolio alloca¬ 
tions, as outlined by Li et al. in [^. But there may be 
other approaches that maintain incentives to bid truthfully 
and increase revenue. 


APPENDIX 

A. VARIANCE OF ALLOCATION PAYOFF 

Theorem 1. 

Uars,xr(k,S,X) (26) 

n n 

jCovs^,s, [Ex^XiiSi),Ex,XjiSi)] (27) 


+ Y,k^Es,Varx,X,{Si). (28) 

i=l 


Proof. Use the well-known equality 19 for variance: 
VarX = EX^ - {EXf-. 


Uars,xr(k, S, X) = Es,xr{\i, S, X)" - [Es,xr(k, S, X)]" . 

(29) 

For the first term, separate expectations for S and X, and 
apply the equality EX^ = VarX + (EX)^: 











(43) 


Es [Uarxr(k,S,X)"] 

(30) 

- [Es.xr(k,S,X)]" = - 

n 

Y,kiEs„x,Xi{Si) 

2 

(43) 

+ Es[Exr(k,S,X)]" 

(31) 

Expand the square. 

.2 = 1 


- [Es,xr(k,S,X)]". 

(32) 





Now expand the three terms one at a time. For the first 
term, use the definition of r(k, S,X): 


Es[Varxr{KS,Xf] (33) 


= (44) 

i=i i=i 


Apply the equality for covariance [^: Cov{X, Y) = EXY — 
{EX){EY). 


= Es V arx. 


ki + ... + ki 

E XhiiSi) 

h = ki+. ..-\-ki_i 


(34) 


Since payoffs are i.i.d. with respect to X, 


[l^arxr(k,S,X)"] (35) 

n 

= EsJ2 kiVarx,Xi{Si) (36) 


= E kiEs, [Varx,Xi{Si)\. (37) 

i = l 

This is the last term on the RHS of the equation in the 
statement of the theorem. 

Next, expand the second term of Equation ( |32[ ). Use the 
definition of r(k, S, X). 


Es[£;xr(k,S,X)]" 


= Es 



k-i-\-...-\-ki 

E Xhi{Si) 


h=k\-\-.. .+fci_ 1 


/ n 

ki + ...+kj 

\i 

hxE 

E 


\ 

g=ki+...+kj_i 

/J 

Since payoffs are i.i.c 

. with respect to X, 



(38) 

(39) 


(40) 


= Es 


Y,kiEx,Xi{Si\ 


J2kjEx^X,{S,) 

j=i 


(41) 


Distribute Es and multiply the sums term-by-term. 


= EE^»^ jEs{Ex,X.{Si)- Ex,Xj{Sj)) . 

i = l j = l 


(42) 


Now expand the third term of Equation (32l. Substitute in 
the expectation of r(k, S,X) from Equation ®. 


n n 

= - E E {Ex^X^iSi) ■ Ex,X,{Sj)) (45) 

i = l j = l 

+Covs„s, {Ex^Xi{Si),Ex^X,{Sj))] (46) 

Carry through the sign and separate the expectation and 
covariance terms. 


= - E E (Jfx.X,(50 • Ex^X,{S,)) (47) 

i=l j = l 


n n 

+ EE^-fc iCons. ,s, (Ex. Xi{Si), Ex,Xj{Sj)) . (48) 

i=i 


The first term cancels Equation (421. The second term com¬ 
pletes the RHS of the statement of the theorem. □ 
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